Together Ho and Ha encompass all possible outcomes:
- For Example:
- Ho: µ=0, Ha: µ ≠ 0
- mean equals 0 or mean does not equal 0
- Ho: µ=3700, Ha: µ ≠ 3700
- mean equals 3700 or mean does not equal 3700
- Ho: µ1 = µ2, Ha: µ1 ≠ µ2
- mean of population 1 equals mean of population 2 or it does not
- Ho: µ > 0, Ha: µ ≤ 0
- can be directional mean is greater than 0 or mean is not equal or less than 0
asdfasfasd
Lecture 5: Statistical hypothesis testing
Tests assess likelihood of the null hypothesis being true
If the Ho is likely false, then Ha assumed to be correct
More precisely:
the long run probability of obtaining sample value (or more extreme one) if the null hypothesis is true
p(data|Ho) - the probability of observing the data given that the null hypothesis Ho is true
asdfasfasd
Lecture 5: Statistical hypothesis testing
Hypothesis tests
Expressed as p-value (0 to 1)
Interpret p-value as:
probability of obtaining sample value of statistic (or more extreme one) if Ho is true
High p-value:
high probability of obtaining sample statistic under Ho
if the null hypothesis (Ho) were true, you would frequently observe data similar to or more extreme than your sample statistic
your observed results are quite compatible with what the null hypothesis predicts
low p-value: low probability of obtaining sample statistic under Ho
if the null hypothesis (Ho) were true, you would rarely observe data similar to or more extreme than your sample statistic
Your results are unusual under the null hypothesis, suggesting that either you’ve witnessed a rare event or the null hypothesis may be incorrect
asdfasfasd
Lecture 5: Statistical hypothesis testing
Statistical test results: - p = 0.3 means that if I repeated the study 100 times, I would get this (or more extreme) result due to chance 30 times - p = 0.03 means that if I repeated the study 100 times, I would get this (or more extreme) result due to chance 3 times
Which p-value suggests Ho likely false?
asdfasfasd
Lecture 5: Statistical hypothesis testing
Statistical test results:
At what point reject Ho? - p < 0.05 conventional “significance threshold” (α = alpha or p value) - p < 0.05 means: - if Ho is true and we repeated the study 100 times - we would get this (or more extreme) result less than 5 times due to chance
asdfasfasd
Lecture 5: Statistical hypothesis testing
Statistical test results: - α is the rate at which we will reject a true null hypothesis (Type I error rate) - Lowering α will lower likelihood of incorrectly rejecting a true null hypothesis (e.g., 0.01, 0.001)
*Both Hs and α are specified **BEFORE collection of data and analysis*
asdfasfasd
Lecture 5: Statistical hypothesis testing
Traditionally α=0.05 is used as a cut off for rejecting null hypothesis
There is nothing magical about 0.05 - actual p-values need to be reported - also need to decide prior to study
p-value range
Interpretation
P > 0.10
No evidence against Ho - data appear consistent with Ho
0.05 < P < 0.10
Weak evidence against the Ho in favor of Ha
0.01 < P < 0.05
Moderate evidence against Ho in favor of Ha
0.001 < P < 0.01
Strong evidence against Ho in favor of Ha
P < 0.001
Very strong evidence against Ho in favor of Ha
Lecture 5: Statistical hypothesis testing
Lecture 5: Statistical hypothesis testing
Fisher:
p-value as informal measure of discrepancy between data and Ho
“If p is between 0.1 and 0.9 there is certainly no reason to suspect the hypothesis tested. If it is below 0.02 it is strongly indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray if we draw a conventional line at .05 …”
Lecture 5: Statistical hypothesis testing
General procedure for H testing:
Specify Null (Ho) and alternate (Ha)
Determine test (and test statistic) to be used
Test statistic is used to compare your data to expectation under Ho (null hypothesis)
Specify significance (α or p value) level below which Ho will be rejected
asdfasfasd
Lecture 5: Statistical hypothesis testing
General procedure for H testing: - Collect data - Perform test - If p-value < α, conclude Ho is likely false and reject it - If p-value > α, conclude no evidence Ho is false and retain it
asdfasfasd
Lecture 5: Brief review
Recall… - Major goal of statistics: inferences about populations from samples… and assign degree of confidence to inferences - Statistical H-testing: formalized approach to inference - Relies on specifying null hypothesis (Ho) and alternate hypothesis (Ha) - Tests assess likelihood of the null hypothesis being true - Expressed as p-value: probability of obtaining sample value of statistic (or more extreme one) if Ho is true
asdfasfasd
Lecture 5: Brief review
Recall hospital example - Probability of getting sample like A (with ȳ at least as far away from 3700 as 3500)? - p(ȳ ≤ 3500 or ȳ ≥ 3900)
What about - 1-tailed or 2-tailed test?
Can solve using SND and z-scores
asdfasfasd
Lecture 5: Brief review
z= (3500-3700)/410 = -0.48
From z table: p= 0.3156 X 2
p of getting sample as far away from µ as A is = 0.6312 (63.1%)
But- usually can’t use z!
Can use t-distribution instead…
Pine Needle Length: Hypothesis Testing Activity
This activity will guide you through the process of conducting single-sample and two-sample t-tests on pine needle data. We’ll explore how environmental factors like wind exposure might affect pine needle length.
You’ll learn to:
- Formulate hypotheses
- Test assumptions
- Perform t-tests
- Visualize data
- Report results accurately
Pine needles from trees
Part 1: Single Sample T-test
A single sample t-test asks whether a population parameter (like \(\bar{x}\)) differs from some expected value.
The question: Is the average pine needle length from our windward sample different from 55mm?
One-sample t-test
Used when we want to compare a sample mean to a known or hypothesized population value.
\(t = \frac{\bar{x} - \mu}{s/\sqrt{n}}\)
where:
- \(\bar{x}\) is the sample mean
- \(\mu\) is the hypothesized population mean
- \(s\) is the sample standard deviation
- \(n\) is the sample size
# Install packages if needed (uncomment if necessary)# install.packages("readr")# install.packages("tidyverse")# install.packages("car")# install.packages("here")# Load librarieslibrary(readr) # For reading CSV fileslibrary(tidyverse) # For data manipulation and visualization
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ purrr 1.0.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(car) # For diagnostic tests
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
# Load the pine needle data# Use here() function to specify the pathpine_data <-read_csv("data/pine_needles.csv")
Rows: 48 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): date, group, n_s, wind
dbl (2): tree_no, len_mm
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Examine the first few rowshead(pine_data)
# A tibble: 6 × 6
date group n_s wind tree_no len_mm
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 3/20/25 cephalopods n lee 1 20
2 3/20/25 cephalopods n lee 1 21
3 3/20/25 cephalopods n lee 1 23
4 3/20/25 cephalopods n lee 1 25
5 3/20/25 cephalopods n lee 1 21
6 3/20/25 cephalopods n lee 1 16
Part 1: Exploratory Data Analysis
Before conducting hypothesis tests, we should always explore our data to understand its characteristics.
Let’s calculate summary statistics and create visualizations.
Activity: Calculate basic summary statistics for pine needle length
# YOUR TASK: Calculate summary statistics for pine needle length# Hint: Use summarize() function to calculate mean, sd, n, etc.# Create a summary table for all pine needlespine_summary <- pine_data %>%summarize(mean_length =mean(len_mm),sd_length =sd(len_mm),n =n(),se_length = sd_length /sqrt(n) )print(pine_summary)
# A tibble: 1 × 4
mean_length sd_length n se_length
<dbl> <dbl> <int> <dbl>
1 17.7 3.53 48 0.509
# Now calculate summary statistics by wind exposure# YOUR CODE HERE
Part 1: Visualizing the Data
Activity: Create visualizations of pine needle length
Create a histogram and a boxplot to visualize the distribution of pine needle length values.
Effective data visualization helps us understand:
- The central tendency
- The spread of the data
- Potential outliers
- Shape of distribution
# YOUR TASK: Create a histogram of pine needle length# Hint: Use ggplot() and geom_histogram()# Histogram of all pine needle lengthsggplot(pine_data, aes(x = len_mm)) +geom_histogram(binwidth =2, fill ="steelblue", color ="black") +labs(title ="Distribution of Pine Needle Length",x ="Length (mm)",y ="Frequency") +theme_minimal()
# Boxplot of pine needle length by wind exposure# YOUR CODE HERE
Part 1: Single Sample T-Test
We want to test if the mean pine needle length on the windward side differs from 55mm.
Activity: Define hypotheses and identify assumptions
H₀: μ = 55 (The mean pine needle length on windward side is 55mm) H₁: μ ≠ 55 (The mean pine needle length on windward side is not 55mm)
Assumptions for t-test:
Data is normally distributed
Observations are independent
No significant outliers
Part 1: Testing Assumptions
Before conducting our t-test, we need to verify that our data meets the necessary assumptions.
Activity: Test the normality assumption
Methods to test normality:
Visual methods:
QQ plots, histograms
Statistical tests: Shapiro
Wilk test
# Filter for just windward side needleswindward_data <- pine_data %>%filter(wind =="wind")# YOUR TASK: Test normality of windward pine needle lengths# QQ PlotqqPlot(windward_data$len_mm, main ="QQ Plot for Windward Pine Needles",ylab ="Sample Quantiles")
# A tibble: 1 × 4
mean_length sd_length n se_length
<dbl> <dbl> <int> <dbl>
1 14.9 1.91 24 0.390
# YOUR TASK: Conduct a single sample t-testt_test_result <-t.test(windward_data$len_mm, mu =55)print(t_test_result)
One Sample t-test
data: windward_data$len_mm
t = -102.85, df = 23, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 55
95 percent confidence interval:
14.11050 15.72284
sample estimates:
mean of x
14.91667
# Calculate t-statistic manually # YOUR CODE HERE: t = (sample_mean - hypothesized_mean) / (sample_sd / sqrt(n))# can you do this manually or manually with R?
Part 1: Interpreting and Reporting Results
Activity: Interpret the t-test results
What does the p-value tell us?
Should we reject or fail to reject the null hypothesis?
How to report this result in a scientific paper:
“A two-tailed, one-sample t-test at α=0.05 showed that the mean pine needle length on the windward side (… mm, SD = …) [was/was not] significantly different from the expected 55 mm, t(…) = …, p = …”
Part 2: Two Sample T-Test
Now, let’s compare pine needle lengths between windward and leeward sides of trees.
Question: Is there a significant difference in needle length between the windward and leeward sides?
This requires a two-sample t-test.
Two-sample t-test compares means from two independent groups.
x̄₁ and x̄₂: These represent the sample means of the two groups you’re comparing.
s²ₚ: This is the pooled variance, calculated as: s²ₚ = [(n₁ - 1)s₁² + (n₂ - 1)s₂²] / (n₁ + n₂ - 2), where s₁² and s₂² are the sample variances of the two groups.
n₁ and n₂: These are the sample sizes of the two groups.
√(1/n₁ + 1/n₂): This represents the pooled standard error.
Part 2: Exploratory Data Analysis by Group
Activity: Calculate summary statistics grouped by wind exposure
Before conducting the test, we need to understand the data for each group.
# YOUR TASK: Calculate summary statistics by wind exposure# Hint: Use group_by() and summarize()group_summary <- pine_data %>%group_by(wind) %>%summarize(mean_length =mean(len_mm),sd_length =sd(len_mm),n =n(),se_length = sd_length /sqrt(n) )print(group_summary)
# A tibble: 2 × 5
wind mean_length sd_length n se_length
<chr> <dbl> <dbl> <int> <dbl>
1 lee 20.4 2.45 24 0.500
2 wind 14.9 1.91 24 0.390
# Calculate the difference in means# YOUR CODE HERE
Part 2: Visualizing Group Differences
Activity: Create visualizations to compare the groups
Visualizing the data can help us understand the differences between groups.
Effective visualizations for group comparisons:
- Side-by-side boxplots
- Violin plots
- Error bar plots
# YOUR TASK: Create boxplots to compare groupsggplot(pine_data, aes(x = wind, y = len_mm, fill = wind)) +geom_boxplot() +labs(title ="Pine Needle Length by Wind Exposure",x ="Wind Exposure",y ="Length (mm)") +theme_minimal()
# YOUR TASK: Create a plot using stat_summary to show means and standard errorsggplot(pine_data, aes(x = wind, y = len_mm, fill = wind)) +stat_summary(fun = mean, geom ="bar") +stat_summary(fun.data = mean_se, geom ="errorbar", width =0.2) +labs(title ="Mean Pine Needle Length by Wind Exposure",x ="Wind Exposure",y ="Mean Length (mm)") +theme_minimal()
Part 2: Testing Assumptions for Two-Sample T-Test
Activity: Test assumptions for two-sample t-test
For a two-sample t-test, we need to check:
1. Normality within each group
2. Equal variances between groups (for standard t-test)
3. Independent observations
If assumptions are violated:
- Welch’s t-test (unequal variances)
- Non-parametric alternatives (Mann-Whitney U test)
# YOUR TASK: Test normality of windward pine needle lengths# QQ PlotqqPlot(pine_data$len_mm, main ="QQ Plot for Windward Pine Needles",ylab ="Sample Quantiles")
[1] 4 28
# Testing normality for each group# Leeward grouplee_data <- pine_data %>%filter(wind =="lee")shapiro_lee <-shapiro.test(lee_data$len_mm)print("Shapiro-Wilk test for leeward data:")
[1] "Shapiro-Wilk test for leeward data:"
print(shapiro_lee)
Shapiro-Wilk normality test
data: lee_data$len_mm
W = 0.95477, p-value = 0.3425
# Windward group# YOUR CODE HERE for windward group normality test
# Test for equal variances# YOUR TASK: Conduct Levene's test for equality of varianceslevene_test <-leveneTest(len_mm ~ wind, data = pine_data)
Warning in leveneTest.default(y = y, group = group, ...): group coerced to
factor.
print(levene_test)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 1.2004 0.2789
46
# Visual check for normality with QQ plots# YOUR CODE HERE
Part 2: Conducting the Two-Sample T-Test
Activity: Conduct a two-sample t-test
Now we can compare the mean pine needle lengths between windward and leeward sides.
H₀: μ₁ = μ₂ (The mean needle lengths are equal)
H₁: μ₁ ≠ μ₂ (The mean needle lengths are different)
Deciding between:
- Standard t-test (equal variances)
- Welch’s t-test (unequal variances)
Based on our Levene’s test result.
# YOUR TASK: Conduct a two-sample t-test# Use var.equal=TRUE for standard t-test or var.equal=FALSE for Welch's t-test# Standard t-test (if variances are equal)t_test_result <-t.test(len_mm ~ wind, data = pine_data, var.equal =TRUE)print("Standard two-sample t-test:")
[1] "Standard two-sample t-test:"
print(t_test_result)
Two Sample t-test
data: len_mm by wind
t = 8.6792, df = 46, p-value = 3.01e-11
alternative hypothesis: true difference in means between group lee and group wind is not equal to 0
95 percent confidence interval:
4.224437 6.775563
sample estimates:
mean in group lee mean in group wind
20.41667 14.91667
# Welch's t-test (if variances are unequal)# YOUR CODE HERE# Calculate t-statistic manually (optional)# YOUR CODE HERE: t = (mean1 - mean2) / sqrt((s1^2/n1) + (s2^2/n2))
Part 2: Interpreting and Reporting Two-Sample T-Test Results
Activity: Interpret the results of the two-sample t-test
What can we conclude about the needle lengths on windward vs. leeward sides?
How to report this result in a scientific paper:
“A two-tailed, two-sample t-test at α=0.05 showed [a significant/no significant] difference in needle length between windward (M = …, SD = …) and leeward (M = …, SD = …) sides of pine trees, t(…) = …, p = ….”
Part 3: Paired T-Test (Extended Activity)
If we collected data in pairs (same tree, different sides), we would use a paired t-test.
How would the analysis differ?
We’d calculate the difference for each pair
Test if the mean difference equals zero
The paired approach often has more statistical power
Paired t-test formula:
\(t = \frac{\bar{d}}{s_d/\sqrt{n}}\)
where:
- \(\bar{d}\) is the mean difference
- \(s_d\) is the standard deviation of differences
- \(n\) is the number of pairs
Final Activity: Assumptions of Parametric Tests
Common assumptions for t-tests:
Normality: Data comes from normally distributed populations
Equal variances (for two-sample tests)
Independence: Observations are independent
No outliers: Extreme values can influence results
What can we do if our data violates these assumptions?
Alternatives when assumptions are violated:
- Data transformation (log, square root, etc.)
- Non-parametric tests
- Bootstrapping approaches
Summary and Conclusions
In this activity, we’ve:
Formulated hypotheses about pine needle length
Tested assumptions for parametric tests
Conducted one-sample and two-sample t-tests
Visualized data using appropriate methods
Learned how to interpret and report t-test results
Key takeaways:
Always check assumptions before conducting tests
Visualize your data to understand patterns
Report results comprehensively
Consider alternatives when assumptions are violated